The Effect of Learner Corpus Size in Grammatical Error Correction of ESL Writings
نویسندگان
چکیده
English as a Second Language (ESL) learners’ writings contain various grammatical errors. Previous research on automatic error correction for ESL learners’ grammatical errors deals with restricted types of learners’ errors. Some types of errors can be corrected by rules using heuristics, while others are difficult to correct without statistical models using native corpora and/or learner corpora. Since adding error annotation to learners’ text is time-consuming, it was not until recently that large scale learner corpora became publicly available. However, little is known about the effect of learner corpus size in ESL grammatical error correction. Thus, in this paper, we investigate the effect of learner corpus size on various types of grammatical errors, using an error correction system based on phrase-based statistical machine translation (SMT) trained on a large scale errortagged learner corpus. We show that the phrase-based SMT approach is effective in correcting frequent errors that can be identified by local context, and that it is difficult for phrase-based SMT to correct errors that need long range contextual information.
منابع مشابه
The WikEd Error Corpus: A Corpus of Corrective Wikipedia Edits and Its Application to Grammatical Error Correction
This paper introduces the freely available WikEd Error Corpus. We describe the data mining process from Wikipedia revision histories, corpus content and format. The corpus consists of more than 12 million sentences with a total of 14 million edits of various types. As one possible application, we show that WikEd can be successfully adapted to improve a strong baseline in a task of grammatical e...
متن کاملGrammatical Error Correction Considering Multi-word Expressions
Multi-word expressions (MWEs) have been recognized as important linguistic information and much research has been conducted especially on their extraction and interpretation. On the other hand, they have hardly been used in real application areas. While those who are learning English as a second language (ESL) use MWEs in their writings just like native speakers, MWEs haven’t been taken into co...
متن کاملBuilding a Large Annotated Corpus of Learner English: The NUS Corpus of Learner English
We describe the NUS Corpus of Learner English (NUCLE), a large, fully annotated corpus of learner English that is freely available for research purposes. The goal of the corpus is to provide a large data resource for the development and evaluation of grammatical error correction systems. Although NUCLE has been available for almost two years, there has been no reference paper that describes the...
متن کاملGrammatical Error Correction of English as Foreign Language Learners
This study aimed to discover the insight of error correction by implementing two correction systems on three Iranian university students. The three students were invited to write four in-class essays throughout the semester, in which their verb errors and individual-selected errors were corrected using the Code Correction System and the Individual Correction System. At the end of the study, the...
متن کاملUsing an Error-Annotated Learner Corpus to Develop an ESL/EFL Error Correction System
This paper presents research on building a model of grammatical error correction, for preposition errors in particular, in English text produced by language learners. Unlike most previous work which trains a statistical classifier exclusively on well-formed text written by native speakers, we train a classifier on a large-scale, error-tagged corpus of English essays written by EFL learners, rel...
متن کامل